DSM + SL 1 ? SC ( Or , Can Adding Scalable Locality to Distributed Shared Memory Yield SuperComputer Power ? )
نویسندگان
چکیده
Distributed Shared Memory , such as that provided by Intel’s Cluster OpenMP, lets programmers treat the combined memory systems of a cluster of workstations as a single large address space. This relieves the programmer of the burden of explicitly transferring data: a correct OpenMP program should still work with Cluster OpenMP. However, by hiding data transfers, such systems also hide a major performance factor: correct OpenMP programs with poor locality-of-reference become correct but intolerably slow Cluster OpenMP programs. Scalable Locality describes the program property of locality that increases with problem size (just as Scalable Parallelism describes the property of parallelism that increases with problem size). In principle, the combination of an optimization that exposes scalable locality and a distributed shared memory system should yield a simple programming model with good performance on a cluster. We have begun to explore a combination of Cluster OpenMP and the Pluto research compiler’s implementation of time tiling , which can produce parallel programs with scalable locality from sequential loop-based dense matrix codes. In this article, we review our approach, discuss our performance model and its implications for tile size selection, and present our most recent experimental tests of the viability of our approach and validity of our performance model. Our performance model captures only machine-independent issues that are critical to setting tile size. It deduces lower bounds on tile dimensions from a combination of purely hardware parameters (e.g. memory bandwidth) and parameters describing the software without reference to any particular hardware (e.g. number of live values produced by the loop nest). We also model load imbalance from OpenMP barriers, which is significant for smaller problems. Our results, while preliminary, are quite encouraging.
منابع مشابه
The Thread Migration Mechanism of DSM-PEPE
In this paper we present the thread migration mechanism of DSM-PEPE, a multithreaded distributed shared memory system. DSM systems like DSM-PEPE provide a parallel environment to harness the available computing power of computer networks. DSM systems offer a virtual shared memory space on top of a distributed-memory multicomputer, featuring the scalability and low cost of a multicomputer, and t...
متن کاملScalable 3D hybrid parallel Delaunay image-to-mesh conversion algorithm for distributed shared memory architectures
In this paper, we present a scalable three dimensional hybrid parallel Delaunay image-to-mesh conversion algorithm (PDR.PODM) for distributed shared memory architectures. PDR.PODM is able to explore parallelism early in the mesh generation process because of the aggressive speculative approach employed by the Parallel Optimistic Delaunay Mesh generation algorithm (PODM). In addition, it decreas...
متن کاملSpeculative prefetching of optional locks in distributed systems
We present a family of methods for speeding up distributed locks by exploiting the uneven distribution of both temporal and spatial locality of access behaviour of many applications. In the worst case, some of our methods will not produce higher network latencies than equivalent conventional distributed locking methods. In best case, the total number of messages can be constantly bounded, appro...
متن کاملWildFire: A Scalable Path for SMPs
Researchers have searched for scalable alternatives to the symmetric multiprocessor (SMP) architecture since it was first introduced in 1982. This paper introduces an alternative view of the relationship between scalable technologies and SMPs. Instead of replacing large SMPs with scalable technology, we propose new scalable techniques that allow large SMPs to be tied together efficiently, while...
متن کاملDistributed Shared Memory in Global Area Networks
Distributed Shared Memory (DSM) has many advantages in heterogeneous environments, such as geographically distant clusters or The Grid. These includes: locality utilization and replication transparency. The fact that processes communicate indirectly through memory rather than directly, is giving DSM these advantages. This paper presents the design of Global PastSet (GPS) which is a DSM system t...
متن کامل